46 results found.
Written
Lexicon,
Language Type:
Multilingual
Languages:
Bulgarian Catalan Chinese Dutch English Estonian Finnish Italian Portuguese Slovenian Spanish Swedish Thai and Turkish
Availability:
Freely Available
License:
Open Source
Size:
41 411 senses for Bulgarian, 35 820 for Swedish OtherProduction Status:
Newly created-in progress
Use:
Word Sense Disambiguation
-
Paper title:A Parallel WordNet for English, Swedish and Bulgarian
-
Paper track:Written/poster presentation with demo
-
Paper status:Accept Poster+Demo
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Krasimir Angelov | GF WordNet | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Croatian English Estonian Finnish Latvian Lithuanian Russian Slovenian Swedish
Availability:
Freely Available
License:
CC-BY-SA
Size:
1446954 entries Production Status:
Newly created-finished
Use:
Evaluation/Validation
-
Paper title:Multilingual Culture-Independent Word Analogy Datasets
-
Paper track:Evaluation/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Matej Ulčar | Multilingual Culture-Independent Word Analogy Datasets | /N |
Documentation:
None
Written
Grammar/Language Model,
Language Type:
Multilingual
Languages:
Croatian Estonian Finnish Latvian Lithuanian Slovenian Swedish
Availability:
Freely Available
License:
GPL v3
Size:
2.4 GByte Production Status:
Newly created-finished
Use:
Machine Learning
-
Paper title:High Quality ELMo Embeddings for Seven Less-Resourced Languages
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Matej Ulčar | Embeddia ELMo embeddings models for seven languages | /N |
Documentation:
No documentation beyond that of original (English) ELMo model at https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bulgarian Catalan Croatian Czech Danish Dutch English Estonian Filipino Finnish French German Greek Hebrew Hindi Hungarian Indonesian Italian Japanese Korean Latvian Lithuanian Malay Norwegian Persian Polish Portuguese Romanian Russian Serbian Simplified Chinese Slovak Slovenian Spanish Swedish Thai Traditional Chinese Turkish Ukrainian Vietnamese
Availability:
Freely Available
License:
CC-BY-SA
Size:
60 GByte Production Status:
Newly created-in progress
Use:
Language Modelling
-
Paper title:Wiki-40B: Multilingual Language Model Dataset
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rami Al-Rfou | Wiki40B-LM | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
<Not Specified>
Size:
22.10G tokens Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | OpenSubtitles2018 | /N |
Documentation:
Yes, on the website.
Written
Lexicon,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
CreativeCommons Attribution 4.0 International
Size:
41 GByte Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | word2word | /N |
Documentation:
Yes, on the website.
Multimodal/Multimedia
Corpus,
Language Type:
Monolingual
Languages:
Adyghe Albanian Ancient Greek Arabic Armenian Asturian Basque Belarusian Bulgarian Catalan Church Slavic Classic Syriac Classical Armenian Czech Danish Dutch English Estonian Faroese Finnish Georgian German Gothic Hindi Hungarian Icelandic Ingrian Irish Kabardian Kalaallisut Kannada Kazakh Khakas Latin Latvian Lithuanian Livonian languages Low German Lower Sorbian Macedonian Maltese Middle French Middle High German Middle Low German Modern Greek Neapolitan Northern Sami Occitan Old English Old French Old Irish Old Saxon Pashto Persian Polish Portuguese Romanian Slovenian Spanish Swedish Tibetan Turkish Turkmen Ukrainian Urdu Veps Votic Welsh
Availability:
Freely Available
License:
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Size:
557.3 MByte Production Status:
Newly created-in progress
Use:
Morphological Analysis
-
Paper title:Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus
-
Paper track:Multimodality/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Eleni Metheniti | Wikinflection Corpus | /N |
Documentation:
https://github.com/lenakmeth/Wikinflection-Corpus/blob/master/README.md
Written
Corpus,
Language Type:
Monolingual
Languages:
Arabic Azerbaijani Belarusian Bulgarian Catalan Danish English Estonian Filipino Finnish Hindi Hungarian Indonesian Irish Italian Japanese Kazakh Korean Latvian Lithuanian Mongolian Norwegian Polish Portuguese Russian Serbian (Latin) Slovenian Spanish Swedish Tamil Turkish Ukrainian Urdu Uzbek Vietnamese ces deu ell fas fra isl kat mkd nld ron slk sqi zho
Availability:
Freely Available
License:
GNU-GPL v.3
Size:
45 billion words Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:Geographically-Balanced Gigaword Corpora for 50 Language Varieties
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jonathan Dunn | GeoWAC | /N |
Documentation:
https://github.com/jonathandunn/earthlings
Written
Corpus,
Language Type:
Monolingual
Languages:
Slovenian
Availability:
From Data Center(s)
License:
Contract
Size:
1,134,693,933 words Production Status:
Existing-updated
Use:
Corpus Creation/Annotation
-
Paper title:Gigafida 2.0: The Reference Corpus of Written Standard Slovene
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Simon Krek | Gigafida | /N |
Documentation:
https://www.cjvt.si/gigafida/wp-content/uploads/sites/10/2019/06/Gigafida2.0_specifikacije.pdf (in Slovene & English)
Written
Corpus,
Language Type:
Monolingual
Languages:
Adyghe Ancient Greek Anglo-Norman Arabic Asturian Azerbaijani Bangla Bashkir Belarusian Breton Bulgarian Catalan Central Kurdish Church Slavic Classical Armenian Classical Syriac Cornish Crimean Tatar Danish English Estonian Faroese Finnish Friulian Galolen Gothic Haida Hebrew Hindi Hungarian Ingrian Irish Italian Kabardian Kalaallisut Kannada Karelian Kashubian Kazakh Khakas Khaling Ladin Latin Latvian Lithuanian Livonian Livvi Low German Lower Sorbian Ludian Maltese Manx Mapuche Middle French Middle High German Middle Low German Murrinh-Patha Navajo Neapolitan No linguistic content Northern Frisian Northern Kurdish Northern Sami Norwegian Bokmål Norwegian Nynorsk Occitan Old English Old French Old Irish Old Saxon Pashto Polish Portuguese Quechua Russian Sanskrit Scottish Gaelic Serbian (Latin) Slovenian Spanish Swahili (Congo - Kinshasa) Swedish Tajik Tatar Telugu Turkish Turkmen Ukrainian Urdu Uzbek Venetian Veps Votic Western Frisian Yiddish Zulu bod ces cym deu ell eus fas fra hye isl kat mkd nld ron sqi
Availability:
Freely Available
License:
CC BY-SA 3.0
Size:
None Production Status:
Existing-updated
Use:
Morphological Analysis
-
Paper title:UniMorph 3.0: Universal Morphology
-
Paper track:Infrastructural Issues/Large Projects/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ekaterina Vylomova | UniMorph | /N |
Documentation:
None




